Goto

Collaborating Authors

 Northamptonshire


Secret warehouse guards lost world of treasures found on HS2 route

BBC News

Treasures unearthed by hundreds of archaeologists so far during work on the controversial planned HS2 train line have been shown exclusively to the BBC. The 450,000 objects, which are being held in a secret warehouse, include a possible Roman gladiator's tag, a hand axe that may be more than 40,000 years old and 19th Century gold dentures. It is an unprecedented amount and array of items, which will yield new insights into Britain's past, says the Centre for British Archaeology. Major building developments in the UK need land to be assessed by archaeologists as part of the planning process, to protect heritage sites. Since 2018 around 1,000 archaeologists have been involved in 60 digs along the route HS2 is set to take between London to Birmingham.


EU investigates Elon Musk's X over Grok AI sexual deepfakes

BBC News

EU investigates Elon Musk's X over Grok AI sexual deepfakes The European Commission has launched an investigation into Elon Musk's X over concerns its AI tool Grok was used to create sexualised images of real people. It follows a similar announcement in January from the UK watchdog Ofcom. Regina Doherty, a member of the European parliament representing Ireland, said the Commission would assess whether manipulated sexually explicit images have been shown to users in the EU. A previous statement from X's Safety account said the social media platform had stopped Grok from digitally altering pictures of people to remove their clothing in jurisdictions where such content is illegal. But campaigners and victims said the ability to generate sexually explicit pictures using the tool should have never happened in the first place, and Ofcom said its investigation would remain ongoing.


Watch: BBC reporter tests AI anti-shoplifting tech

BBC News

Some major retailers and independent stores have introduced AI body scans, CCTV or facial recognition equipment to identify crimes like shoplifting.


Britain will be battered by giant HAILSTONES thanks to climate change: Huge ice balls 'could damage aircraft and properties', study warns

Daily Mail - Science & tech

Cassie Ventura's attorney responds to Diddy sentencing as she's hailed by judge who jailed vile rapper It's day one of Diddy's comeback tour: MAUREEN CALLAHAN's dark prediction of Sean Combs' shameless next act... and who'll be welcoming him back with open arms The truth about Keith Urban's guitarist'other woman' Maggie Baugh revealed amid Nicole Kidman divorce Taylor, your album should be'Life of a Callgirl'. KENNEDY's appalled take on Swift's new record... and its ultra-vivid sex shout outs for Travis the Sasquatch My war with Harry & Meghan, by PIERS MORGAN: What really happened, their absurd accusations, the brutal truth about post-royal life... and how I believe their royal racism lies helped kill off woke Shroud of Turin mystery deepens as surgeon spots hidden detail that points to Jesus' resurrection I'm no longer sleeping with my husband - and never will again, says MOLLY RYDDELL. I love him, but counted down the moments until he climaxed. Then I couldn't bear it any more and the truth spilled out... so many women feel the same Fans erupt at Taylor Swift's'dig' at Travis Kelce's ex Kayla Nicole in wild The Life of a Showgirl track Trump dollar coin design released by Treasury... and it's inspired by an iconic political photo How I look like this at 62. I've lost 5 stone fast, 20 years off my biological age and wear size 8... without weight-loss jabs. The THREE singers Keith Urban's been cosying up to revealed - now Nicole Kidman's on the warpath and has done the thing every estranged husband fears most: ALISON BOSHOFF Lori Loughlin's husband Mossimo Giannulli seen with mystery brunette in tiny skirt day after shock split Trump appears alongside Melania at dinner hosted by JD Vance and Usha after'disappearance' rumors Top plastic surgeons reveal secrets behind Taylor Swift's'changing' face: 'It is looking very full' I'm a woman with autism... here are the signs you might be masking, even from yourself Cake-faced 90s sitcom star looks unrecognizable as she ditches the heavy eyeshadow for an LA errand run can you guess who? Britain will be battered by giant HAILSTONES thanks to climate change: Huge ice balls'could damage aircraft and properties', study warns Giant hailstones could soon become the norm in Britain - with climate change to blame.


Service, Solidarity, and Self-Help: A Comparative Topic Modeling Analysis of Community Unionism in the Boot and Shoe Union and Unite Community

Compton, Thomas

arXiv.org Artificial Intelligence

This paper presents a comparative analysis of community unionism (CU) in two distinct historical and organizational contexts: the National Boot and Shoe Union (B\&S) in the 1920s and Unite Community in the 2010s--2020s. Using BERTopic for thematic modeling and cTF-IDF weighting, alongside word frequency analysis, the study examines the extent to which each union's discourse aligns with key features of CU -- such as coalition-building, grassroots engagement, and action beyond the workplace. The results reveal significant differences in thematic focus and discursive coherence. While Unite Community demonstrates stronger alignment with outward-facing, social justice-oriented themes, the B\&S corpus emphasizes internal administration, industrial relations, and member services -- reflecting a more traditional, servicing-oriented union model. The analysis also highlights methodological insights, demonstrating how modern NLP techniques can enhance the study of historical labor archives. Ultimately, the findings suggest that while both unions engage with community-related themes, their underlying models of engagement diverge significantly, challenging assumptions about the continuity and universality of community unionism across time and sector.


Leveraging Semantic Triples for Private Document Generation with Local Differential Privacy Guarantees

Meisenbacher, Stephen, Chevli, Maulik, Matthes, Florian

arXiv.org Artificial Intelligence

Many works at the intersection of Differential Privacy (DP) in Natural Language Processing aim to protect privacy by transforming texts under DP guarantees. This can be performed in a variety of ways, from word perturbations to full document rewriting, and most often under local DP. Here, an input text must be made indistinguishable from any other potential text, within some bound governed by the privacy parameter $\varepsilon$. Such a guarantee is quite demanding, and recent works show that privatizing texts under local DP can only be done reasonably under very high $\varepsilon$ values. Addressing this challenge, we introduce DP-ST, which leverages semantic triples for neighborhood-aware private document generation under local DP guarantees. Through the evaluation of our method, we demonstrate the effectiveness of the divide-and-conquer paradigm, particularly when limiting the DP notion (and privacy guarantees) to that of a privatization neighborhood. When combined with LLM post-processing, our method allows for coherent text generation even at lower $\varepsilon$ values, while still balancing privacy and utility. These findings highlight the importance of coherence in achieving balanced privatization outputs at reasonable $\varepsilon$ levels.


Inter(sectional) Alia(s): Ambiguity in Voice Agent Identity via Intersectional Japanese Self-Referents

Fujii, Takao, Seaborn, Katie, Steeds, Madeleine, Kato, Jun

arXiv.org Artificial Intelligence

Conversational agents that mimic people have raised questions about the ethics of anthropomorphizing machines with human social identity cues. Critics have also questioned assumptions of identity neutrality in humanlike agents. Recent work has revealed that intersectional Japanese pronouns can elicit complex and sometimes evasive impressions of agent identity. Yet, the role of other "neutral" non-pronominal self-referents (NPSR) and voice as a socially expressive medium remains unexplored. In a crowdsourcing study, Japanese participants (N = 204) evaluated three ChatGPT voices (Juniper, Breeze, and Ember) using seven self-referents. We found strong evidence of voice gendering alongside the potential of intersectional self-referents to evade gendering, i.e., ambiguity through neutrality and elusiveness. Notably, perceptions of age and formality intersected with gendering as per sociolinguistic theories, especially boku and watakushi. This work provides a nuanced take on agent identity perceptions and champions intersectional and culturally-sensitive work on voice agents.


Predictors of Childhood Vaccination Uptake in England: An Explainable Machine Learning Analysis of Longitudinal Regional Data (2021-2024)

Noroozi, Amin, Esha, Sidratul Muntaha, Ghari, Mansoureh

arXiv.org Artificial Intelligence

Childhood vaccination is a cornerstone of public health, yet disparities in vaccination coverage persist across England. These disparities are shaped by complex interactions among various factors, including geographic, demographic, socioeconomic, and cultural (GDSC) factors. Previous studies mostly rely on cross-sectional data and traditional statistical approaches that assess individual or limited sets of variables in isolation. Such methods may fall short in capturing the dynamic and multivariate nature of vaccine uptake. In this paper, we conducted a longitudinal machine learning analysis of childhood vaccination coverage across 150 districts in England from 2021 to 2024. Using vaccination data from NHS records, we applied hierarchical clustering to group districts by vaccination coverage into low- and high-coverage clusters. A CatBoost classifier was then trained to predict districts' vaccination clusters using their GDSC data. Finally, the SHapley Additive exPlanations (SHAP) method was used to interpret the predictors' importance. The classifier achieved high accuracies of 92.1, 90.6, and 86.3 in predicting districts' vaccination clusters for the years 2021-2022, 2022-2023, and 2023-2024, respectively. SHAP revealed that geographic, cultural, and demographic variables, particularly rurality, English language proficiency, the percentage of foreign-born residents, and ethnic composition, were the most influential predictors of vaccination coverage, whereas socioeconomic variables, such as deprivation and employment, consistently showed lower importance, especially in 2023-2024. Surprisingly, rural districts were significantly more likely to have higher vaccination rates. Additionally, districts with lower vaccination coverage had higher populations whose first language was not English, who were born outside the UK, or who were from ethnic minority groups.


Bridget Phillipson eyes AI's potential to free up teachers' time

The Guardian

AI tools will soon be in use in classrooms across England, but the education secretary, Bridget Phillipson, has one big question she wants answered: will they save time? Attending a Department for Education-sponsored hackathon in central London last week, Phillipson listened as developers explained how their tools could compile pupil reports, improve writing samples and even assess the quality of soldering done by trainee electrical engineers. After listening to one developer extol their AI writing analysis tool as "superhuman", able to aggregate all the writing a pupil had ever done, Phillipson asked bluntly: "Do you know how much time it will have saved?" That will be our next step, the developer admitted, less confidently. In an interview with the Guardian, Phillipson said her interest in AI was less futuristic and more practical.


Analyzing Similarity Metrics for Data Selection for Language Model Pretraining

Sam, Dylan, Chakrabarti, Ayan, Rostamizadeh, Afshin, Ramalingam, Srikumar, Citovsky, Gui, Kumar, Sanjiv

arXiv.org Artificial Intelligence

Similarity between training examples is used to curate pretraining datasets for language models by many methods -- for diversification and to select examples similar to high-quality data. However, similarity is typically measured with off-the-shelf embedding models that are generic or trained for tasks such as retrieval. This paper introduces a framework to analyze the suitability of embedding models specifically for data curation in the language model pretraining setting. We quantify the correlation between similarity in the embedding space to similarity in pretraining loss between different training examples, and how diversifying in the embedding space affects pretraining quality. We analyze a variety of embedding models in our framework, with experiments using the Pile dataset for pretraining a 1.7B parameter decoder-only language model. We find that the embedding models we consider are all useful for pretraining data curation. Moreover, a simple approach of averaging per-token embeddings proves to be surprisingly competitive with more sophisticated embedding models -- likely because the latter are not designed specifically for pretraining data curation. Indeed, we believe our analysis and evaluation framework can serve as a foundation for the design of embedding models that specifically reason about similarity in pretraining datasets.